Topic Identification in Discourse
نویسنده
چکیده
This paper proposes a corpus-based language model for topic identification. We analyze the association of noun-noun and noun-verb pairs in LOB Corpus. The word association norms are based on three factors: 1) word importance, 2) pair co-occurrence, and 3) distance. They are trained on the paragraph and sentence levels for noun-noun and nounverb pairs, respectively. Under the topic coherence postulation, the nouns that have the strongest connectivities with the other nouns and verbs in the discourse form the preferred topic set. The collocational semantics then is used to identify the topics from paragraphs and to discuss the topic shift phenomenon among paragraphs.
منابع مشابه
A Corpus-Based Approach to Text Partition
A text partition model is proposed to determine the boundaries of discourse structures. It is based on association of noun-noun relations and noun-verb relations defined on discourse level and sentence level. Three factors are considered: 1) repetition of words, 2) importance of words, and 3) collocational semantics. Ten texts serve as experimental objects. The applications of the results to se...
متن کاملIdentification of Direct/Indirect Discourse in Children's Stories
The automatic identification of direct and indirect discourses is a topic not yet explored in Natural Language Processing. We developed the DID system that when applied to children stories identifies the discourses, relative to the narrator (indirect discourse) or to the characters taking part in the story (direct discourse). This automation can be advantageous, namely when it is necessary to t...
متن کاملThe Life and Death of Discourse Entities: Identifying Singleton Mentions
A discourse typically involves numerous entities, but few are mentioned more than once. Distinguishing discourse entities that die out after just one mention (singletons) from those that lead longer lives (coreferent) would benefit NLP applications such as coreference resolution, protagonist identification, topic modeling, and discourse coherence. We build a logistic regression model for predic...
متن کاملSequence Models and Ranking Methods for Discourse Parsing
Sequence Models and Ranking Methods for Discourse Parsing A dissertation presented to the Faculty of the Graduate School of Arts and Sciences of Brandeis University, Waltham, Massachusetts by Ben Wellner Many important aspects of natural language reside beyond the level of a single sentence or clause, at the level of the discourse, including: reference relations such anaphora, notions of topic/...
متن کاملUnsupervised Topic Modelling for Multi-Party Spoken Discourse
We present a method for unsupervised topic modelling which adapts methods used in document classification (Blei et al., 2003; Griffiths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification: automatically segmenting multi-party...
متن کامل